Bacău County
Reddit is all you need: Authorship profiling for Romanian
Ştefănescu, Ecaterina, Jerpelea, Alexandru-Iulius
Authorship profiling is the process of identifying an author's characteristics based on their writings. This centuries old problem has become more intriguing especially with recent developments in Natural Language Processing (NLP). In this paper, we introduce a corpus of short texts in the Romanian language, annotated with certain author characteristic keywords; to our knowledge, the first of its kind. In order to do this, we exploit a social media platform called Reddit. We leverage its thematic community-based structure (subreddits structure), which offers information about the author's background. We infer an user's demographic and some broad personal traits, such as age category, employment status, interests, and social orientation based on the subreddit and other cues. We thus obtain a 23k+ samples corpus, extracted from 100+ Romanian subreddits. We analyse our dataset, and finally, we fine-tune and evaluate Large Language Models (LLMs) to prove baselines capabilities for authorship profiling using the corpus, indicating the need for further research in the field. We publicly release all our resources.
- Europe > Romania > Vest Development Region > Timiș County > Timișoara (0.05)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- Europe > Romania > Sud-Vest Oltenia Development Region > Dolj County > Craiova (0.04)
- (14 more...)
Membership Inference Attacks Against In-Context Learning
Wen, Rui, Li, Zheng, Backes, Michael, Zhang, Yang
Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities. We propose four attack strategies tailored to various constrained scenarios and conduct extensive experiments on four popular large language models. Empirical results show that our attacks can accurately determine membership status in most cases, e.g., 95\% accuracy advantage against LLaMA, indicating that the associated risks are much higher than those shown by existing probability-based attacks. Additionally, we propose a hybrid attack that synthesizes the strengths of the aforementioned strategies, achieving an accuracy advantage of over 95\% in most cases. Furthermore, we investigate three potential defenses targeting data, instruction, and output. Results demonstrate combining defenses from orthogonal dimensions significantly reduces privacy leakage and offers enhanced privacy assurances.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
- Europe > Romania > Nord-Est Development Region > Bacău County > Bacău (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.99)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning
Wen, Rui, Wang, Tianhao, Backes, Michael, Zhang, Yang, Salem, Ahmed
Large Language Models (LLMs) are powerful tools for natural language processing, enabling novel applications and user experiences. However, to achieve optimal performance, LLMs often require adaptation with private data, which poses privacy and security challenges. Several techniques have been proposed to adapt LLMs with private data, such as Low-Rank Adaptation (LoRA), Soft Prompt Tuning (SPT), and In-Context Learning (ICL), but their comparative privacy and security properties have not been systematically investigated. In this work, we fill this gap by evaluating the robustness of LoRA, SPT, and ICL against three types of well-established attacks: membership inference, which exposes data leakage (privacy); backdoor, which injects malicious behavior (security); and model stealing, which can violate intellectual property (privacy and security). Our results show that there is no silver bullet for privacy and security in LLM adaptation and each technique has different strengths and weaknesses.
- North America > United States > Virginia (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
- Europe > Romania > Nord-Est Development Region > Bacău County > Bacău (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.52)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)
A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop
Holzinger, Andreas, Plass, Markus, Holzinger, Katharina, Crisan, Gloria Cerasela, Pintea, Camelia-M., Palade, Vasile
The goal of Machine Learning to automatically learn from data, extract knowledge and to make decisions without any human intervention. Such automatic (aML) approaches show impressive success. Recent results even demonstrate intriguingly that deep learning applied for automatic classification of skin lesions is on par with the performance of dermatologists, yet outperforms the average. As human perception is inherently limited, such approaches can discover patterns, e.g. that two objects are similar, in arbitrarily high-dimensional spaces what no human is able to do. Humans can deal only with limited amounts of data, whilst big data is beneficial for aML; however, in health informatics, we are often confronted with a small number of data sets, where aML suffer of insufficient training samples and many problems are computationally hard. Here, interactive machine learning (iML) may be of help, where a human-in-the-loop contributes to reduce the complexity of NP-hard problems. A further motivation for iML is that standard black-box approaches lack transparency, hence do not foster trust and acceptance of ML among end-users. Rising legal and privacy aspects, e.g. with the new European General Data Protection Regulations, make black-box approaches difficult to use, because they often are not able to explain why a decision has been made. In this paper, we present some experiments to demonstrate the effectiveness of the human-in-the-loop approach, particularly in opening the black-box to a glass-box and thus enabling a human directly to interact with an learning algorithm. We selected the Ant Colony Optimization framework, and applied it on the Traveling Salesman Problem, which is a good example, due to its relevance for health informatics, e.g. for the study of protein folding. From studies of how humans extract so much from so little data, fundamental ML-research also may benefit.
- North America > United States > New York (0.04)
- North America > United States > Connecticut (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (5 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)
- Leisure & Entertainment > Games > Chess (0.68)
- Health & Medicine > Therapeutic Area > Oncology (0.67)
Microsoft Pix update lets your turn your images into art
Microsoft has rolled out an update to its AI-powered photo editing app Pix that lets you turn your iPhone images into art. Pix was originally designed to improve the appearance of photos by tweaking elements such as colour levels and exposure. But Microsoft's latest update, which bears a striking resemblance to the iOS app Prisma, lets you have a little more fun by turning photos into masterpieces. Microsoft has rolled out an update to its AI-powered photo editing app Pix that lets you turn your iPhone images into art. Microsoft has rolled out an update to its AI-powered photo editing app Pix that lets you turn your iPhone images into art.
- Europe > Romania > Nord-Est Development Region > Bacău County > Bacău (0.07)
- North America > United States > Washington > King County > Redmond (0.05)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
- Europe > Portugal (0.05)
Microsoft Pix Camera imitates Prisma with its AI-powered filters
Microsoft Pix Camera uses artificial intelligence to make your pictures of people better. It uses algorithms behind the scenes to analyze the 10 frames it snaps for every picture you take, looking for sharpness, exposure and even facial expressions to make sure you get the very best shot. It even takes good data from the pictures it doesn't use to enhance the photos it chooses. The app, launched last summer and just updated, now offers new filters that can help you make your photos look like real works of art. These artsy filters may sound a lot like what standalone app, Prisma, does, but Microsoft's implementation was developed by Microsoft's Asia research lab in collaboration with Skype.
- Asia (0.26)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.06)
- Europe > Romania > Nord-Est Development Region > Bacău County > Bacău (0.06)
Syntactic Analysis Based on Morphological Characteristic Features of the Romanian Language
This paper refers to the syntactic analysis of phrases in Romanian, as an important process of natural language processing. We will suggest a real-time solution, based on the idea of using some words or groups of words that indicate grammatical category; and some specific endings of some parts of sentence. Our idea is based on some characteristics of the Romanian language, where some prepositions, adverbs or some specific endings can provide a lot of information about the structure of a complex sentence. Such characteristics can be found in other languages, too, such as French. Using a special grammar, we developed a system (DIASEXP) that can perform a dialogue in natural language with assertive and interogative sentences about a "story" (a set of sentences describing some events from the real life).
- Europe > Romania > Nord-Est Development Region > Bacău County > Bacău (0.05)
- North America > United States > Maine (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (4 more...)
Parallel ACO with a Ring Neighborhood for Dynamic TSP
Pintea, Camelia-M., Crisan, Gloria Cerasela, Manea, Mihai
The current paper introduces a new parallel computing technique based on ant colony optimization for a dynamic routing problem. In the dynamic traveling salesman problem the distances between cities as travel times are no longer fixed. The new technique uses a parallel model for a problem variant that allows a slight movement of nodes within their Neighborhoods. The algorithm is tested with success on several large data sets.
- North America > United States > New York (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Romania > Nord-Vest Development Region > Maramureș County > Baia Mare (0.04)
- (6 more...)
Soft Computing approaches on the Bandwidth Problem
Czibula, Gabriela, Crisan, Gloria Cerasela, Pintea, Camelia-M., Czibula, Istvan-Gergely
The Matrix Bandwidth Minimization Problem (MBMP) seeks for a simultaneous reordering of the rows and the columns of a square matrix such that the nonzero entries are collected within a band of small width close to the main diagonal. The MBMP is a NP-complete problem, with applications in many scientific domains, linear systems, artificial intelligence, and real-life situations in industry, logistics, information recovery. The complex problems are hard to solve, that is why any attempt to improve their solutions is beneficent. Genetic algorithms and ant-based systems are Soft Computing methods used in this paper in order to solve some MBMP instances. Our approach is based on a learning agent-based model involving a local search procedure. The algorithm is compared with the classical Cuthill-McKee algorithm, and with a hybrid genetic algorithm, using several instances from Matrix Market collection. Computational experiments confirm a good performance of the proposed algorithms for the considered set of MBMP instances. On Soft Computing basis, we also propose a new theoretical Reinforcement Learning model for solving the MBMP problem.
- North America > United States > Hawaii (0.04)
- Europe > Romania > Nord-Vest Development Region > Cluj County > Cluj-Napoca (0.04)
- Europe > Romania > Nord-Vest Development Region > Maramureș County > Baia Mare (0.04)
- Europe > Romania > Nord-Est Development Region > Bacău County > Bacău (0.04)